TextPipe: Online Help
    Regular Expression Hints
 

Submit feedback on this topic 

 Home  User Assistance   Tutorials   How to Use TextPipe
 Menus: File   Edit   Filters[ Convert   Add   Remove   Unicode   Replace   Special   Map   Email   Restrict ]  Tools   Window   Help   Advanced

 

Introduction

Need we say that regular expressions are a powerful, but sometimes tricky, part of TextPipe. The good news is that you don't need to be a rocket scientist to use regular expressions. The simple ones are just that -- fairly simple -- and they are all you need to understand for many applications. If you're relatively unfamiliar with regular expressions -- regexps for short -- be sure to read the section below Tips on Seat-of-the-Pants Pattern Matching. 

If, on the other hand, you are already comfortable using regular expressions, you won't find much of value in the Tips section, but may find subsequent tables and examples useful. Our focus is on items, like quantifiers, that you keep looking up in your reference materials because you don't use them often enough to remember the fine points.

Tips on Seat-of-the-Pants Pattern Matching

These are tips, not a tutorial. We assume you already know how to code simple substitutions with regular expressions.

The irony of pattern matching with regular expressions is that although it's a highly deterministic operation, this doesn't help you much unless you are a rocket scientist. The pattern matching engine uses a complex set of rules. But, the rules are hard to understand and difficult to apply to practical problems of some complexity -- even for full-time Web page developers, much less part-timers. Fortunately, you can mostly forget about the formal matching rules and take a much more practical or seat-of-the pants approach.

With this as background, here are our tips: 

  1. Never overwrite your input file. There's no Undo function as there is with your favorite word processor. It's easy, oh so easy, to make coding errors in substitution statements, even when you have experience coding them. 
  2. If you're using the don't-care character sequence (".*" or ".+") in your pattern, watch out for greedy matching -- which typically matches the right-most pattern rather than the left-most. See the section Quantifiers for examples. 

Metacharacters

The following characters -- so-called metacharacters -- have special meaning in regular expressions. To match them, precede them with a backslash.

Metacharacter Purpose
first character of assertions, such as \w or \d
OR; allows matching options
. match any character
( ... )  grouping operator; builds $1, $2, etc.
[ ... ]  match any character within brackets
quantifier: match one or more times
? quantifier: at most one match
quantifier: match zero or more times

Assertions identify special conditions for a pattern match. The following table covers commonly used assertions.

Common Assertions
Assertion Function
\d  digit [0-9]
\d+  number
\D  nondigit
\s  white-space
\S non white-space
\w  alphanumeric character [a-zA-Z_0-9]
\w+  alphanumeric word 
\W  nonalphanumeric character

Quantifiers

This section complements discussions of quantifiers that are long on words but short on examples. We examine a host of variations on a basic example, using the various quantifiers in the following scenario: 

Search for

(pe)

Replace with

--

Target string

Pepe pens Perl pearls. 

We use parentheses in all examples for consistency. They only matter here when followed by a quantifier. Thus (pe)+ looks for successive occurrences of pe, whereas pe+ looks for successive occurrences of e.

Substitutions shown in red merit close attention.

Search string Case sensitive? Result Notes

No quantifier (i.e., exactly one match)

Pepe pens Perl pearls.
(pe) Y Pe-- --ns Perl --arls. 
(pe) N ---- --ns --rl --arls. 

Plus quantifier (one or more matches)

Pepe pens Perl pearls.
(pe)+ Y Pe-- --ns Perl --arls. 
(pe)+ N -- --ns --rl --arls. 

Question mark quantifier (at most one match)

Pepe pens Perl pearls.
(pe)? Y --P--e---- ----n--s-- --P- etc. 

Asterisk quantifier (zero or more matches)

Pepe pens Perl pearls.
(pe)* Y --P--e---- ----n--s-- --P- etc. 

Leading don't-care characters

Pepe pens Perl pearls.
.*(pe) Y --arls. Greedy match
.*(Pe) Y --rl pearls.Greedy match
.+(Pe) Y --rl pearls. 

See also

Regular expressions

 Contact Us   Support   Community   Tutorials and User Guides (online)
 ⌐ 1999-2005 Crystal Software. All rights reserved.